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Abstract. Clarke and Barron have recently shown that the Jeffreys' invariant prior 
■ of Bayesian theory yields the common asymptotic (minimax and maximin) redundancy 

^\ I of universal data compression in a parametric setting. We seek a possible analogue of 

this result for the two-level quantum systems. We restrict our considerations to prior 
^ I probability distributions belonging to a certain one-parameter family, q^, — oo < u < 1. 

Within this setting, we are able to compute exact redundancy formulas, for which we 
find the asymptotic limits. We compare our quantum asymptotic redundancy formulas 
to those derived by naively applying the classical counterparts of Clarke and Barron, 
and find certain common features. Our results are based on formulas we obtain for 
the eigenvalues and eigenvectors of 2" x 2" (Bayesian density) matrices, Cn(^^)- These 
CO ' matrices are the weighted averages (with respect to qu) of all possible tensor products 

^ . of n identical 2x2 density matrices, representing the two-level quantum systems. 

I We propose a form of universal coding for the situation in which the density matrix 

. describing an ensemble of quantum signal states is unknown. A sequence of n signals 

^SJ ' would be projected onto the dominant cigenspaces of Cn{u). 

Index terms — quantum information theory, two- level quantum systems, universal 
data compression, asymptotic redundancy, Jeffreys' prior, Bayes redundancy, Schu- 
macher compression, ballot paths, Dyck paths, relative entropy, Bayesian density ma- 
trices, quantum coding, Bayes codes, monotone metric, symmetric logarithmic deriv- 
ative, Kubo-Mori/Bogoliubov metric 



> 

J3 ■ 1. Introduction 

In recent years, there have been a considerable number of important developments 
in the extension of (classical) information-theoretic concepts to a quantum-mechanical 
setting. Bennett and Shor have surveyed this progress in the outstanding Commemo- 
rative Issue 1948-1998 of the IEEE Transactions on Information Theory. In particular, 
they pointed out — in strict analogy to the classical case, successfully studied some 



fifty years ago by Shannon in famous landmark work — that quantum data com- 
pression allows signals from a redundant quantum source to be compressed into a bulk 
approaching the source's (quantum) entropy. Bennett and Shor did not, however, dis- 
cuss the intriguing case which arises when the specific nature of the quantum source is 
unknown. This, of course, corresponds to the classical question of universal coding or 
data compression (see [^, Sec. II. E]). 

We do address this interesting issue here, by investigating whether or not it is possible 
to extend to the quantum domain, recent (classical) seminal results of Clarke and Barron 
|TB| , |T7|, |T^. They, in fact, derived various forms of asymptotic redundancy of universal 

^Research supported in part by the MSRI, Berkeley 
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data compression for parameterized families of probability distributions. Their analyses 
provide a rigorous basis for the reference prior method in Bayesian statistical analysis. 
For an extensive commentary on the results of Clarke and Barron, see |^5|. Also see |]15| 



for some recent related research, as well as a discussion of various rationales that have 
been employed for using the (classical) Jeffreys' prior — a possible quantum counterpart 
of which will be of interest here — for Bayesian purposes, cf . . Let us also bring to 
the attention of the reader that in a brief review ^| of [1^, the noted statistician, I. J. 



Good, commented that Clarke and Barron "have presumably overlooked the reviewer's 
work" and cited, in this regard |]26| , p7| .Q 



Let us briefly recall the basic setup and the results of Clarke and Barron that are 
relevant to the analyses of our paper. Clarke and Barron work in a noninformative 
Bayesian framework, in which we are given a parametric family of probability densities 
{Pe : 6' G C M'^} on a space X. These probability densities generate independent 
identically distributed random variables Xi, X2, . . . , X„, which, for a fixed 6, we consider 
as producing strings of length n according to the probability density Pg of the n-fold 
product of probability distributions. Now suppose that Nature picks a 9 from 6, that 
is a joint density Pg on the product space X" = (Xi, X2, . . . , X„), the space of strings 
of length n. On the other hand, a Statistician chooses a distribution Qn on as his 
best guess of Pg. Of course, there is a loss of information, which is measured by the 
total relative entropy D{Pg\\Qn), where D{P\\Q) is the Kullback-Leibler divergence of 
P and Q (the relative entropy of P with respect to Q). For finite n, and for a given 
prior w{6)d6 on 9, by a result of Aitchison [Q, pp. 549/550], the best strategy Qn to 
minimize the average risk J D{Pg\\Qn)w{9) dO is to choose for Qn the mixture density 
= f Pgw{6) dO. This is called a Bayes procedure or a Bayes strategy. 

The quantities corresponding to such a procedure that must be investigated are 
the risk {redundancy) of the Bayes strategy D{Pg\\M'^) and the Bayes risk, the av- 
erage of risks, J D{Pg\\M'!^)w{9) dO. The Bayes risk equals Shannon's mutual infor- 
mation 7(9; X") (see [0, ^). Moreover, the Bayes risk is bounded above by the 
minimax redundancy ming^ max^^e -D(PJ'||Q„). In fact, by a result of Gallager |^ 



and Davisson and Leon-Garcia ||2T| (see |^ for a generalization), for each fixed n 
there is a prior which realizes this upper bound, i.e., the maximin redundancy 
max^ J D{Pg\\Ml^)w{9) dO and the minimax redundancy are the same. Such a prior 
Wn is called capacity achieving or least favorable. 

Clarke and Barron investigate the above-mentioned quantities asymptotically, that 
is, for n tending to infinity. First of all, in [l^, (1.4)], [0, (2.1b)], they show that the 



redundancy D{Pg\\M^) of the Bayes strategy is asymptotically 

u Ti \ 

-log— + - logdet /(^) - \ogw{e) + 0(1), (1.1) 
/ /vre 2, 

as n tends to infinity. Here, I{d) is the dx d Fisher information matrix — the negative 
of the expected value of the Hessian of the logarithm of the density function. (Although 
the binary logarithm is usually used in the quantum coding literature, we employ the 



^It should be noted that in these papers, Good uses a more general objective function — a two- 
parameter utility — than the relative entropy, chosen by Clarke and Barron over alternative measures 
[p^ p. 454]. Good does conclude that Jeffreys' invariant prior is the minimax, that is, the least 
favorable, prior when the utility is the "weight of the evidence" in the sense of C. S. Pierce, that is, 
the relative entropy. 
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natural logarithm throughout this paper, chiefly to facilitate comparisons of our results 
with those of Clarke and Barron WB, ^71 Il8|.) For priors supported on a compact subset 



K in the interior of the domain of parameters, the asymptotic minimax redundancy 
miuQ^ maxeee -D(Pg"||(5n) was shown to be [0, (2.4)], |T§, 



^log ^ + log j^^/d^mde + o(l). (1.2) 

Moreover |T^, (2.6)], it is Jeffreys' prior w* = ^/detT(6)/c (with c = ^J(^ei I{9) a 
normalizing constant; see also [|T0|) which is the unique continuous and positive prior 
on K which is asymptotically least favorable, i.e., for which the asymptotic maximin 
redundancy achieves the value ( p..2| ). In particular, asymptotically the maximin and 
minimax redundancies are the same. 

In obvious contrast to classical information theory, quantum information theory di- 
rectly relies upon the fundamental principles of quantum mechanics. This is due to the 
fact that the basic unit of quantum computing, the "quantum bit" or "qubit," is typ- 
ically a (two-state) microscopic system, possibly an atom or nuclear spin or polarized 
photon, the behavior of which (e.g. entanglement, interference, superposition, stochas- 
ticity, . . . ) can only be accurately explained using the rules of quantum theory . We 
refer the reader to [§] for a comprehensive introduction to these matters (including the 
subjects of quantum error-correcting codes and quantum cryptography). Here, we shall 
restrict ourselves to describing, in mathematical terms, the basic notions of quantum 
information theory, how they pertain to data compression, and in what manner they 
parallel the corresponding notions from classical information theory. 

In quantum information theory, the role of probability densities is played by density 
matrices, which are, by definition, nonnegative definite Hermitian matrices of unit trace, 
and which can be considered as operators acting on a (finite-dimensional) Hilbert space. 
Any probability density on a (finite) set X = {xi, X2, ■ ■ ■ ,Xm}, where the probability of 
Xi equals Pi, is representable in this framework by a diagonal matrix diag(pi,p2 5 • • • ,Pm) 
(which is quite clearly itself, a nonnegative definite Hermitian matrix with unit trace). 
Given two density matrices pi and p2, the quantum counterpart of the relative entropy, 
that is, the relative entropy of pi with respect to p2, is ||3^, |59| (cf. pip. 



S{pi,p2) = Trpi(logpi - logp2), (1-3) 

where the logarithm of a matrix p is defined as Ylk>ii~^)''^^iP~'^)^/^'> with / the appro- 
priate identity matrix. (Alternatively, if p acts diagonally on a basis {vi, V2, ■ ■ ■ ,Vm} of 
the Hilbert space by pvi = XiVi, then log p acts by (log p)vi = (log \i)vi, i = 1,2, . . . , m.) 
Clearly, if pi and p2 are diagonal matrices, corresponding to classical probability den- 
sities, then ( |1.3| ) reduces to the usual Kullback-Leibler divergence. 

As we said earlier, our goal is to examine the possibility of extending the results 
of Clarke and Barron to quantum theory. That is, first of all we have to replace the 
(classical) probabihty densities Pg by density matrices. We are not able to proceed 
in complete generality, but rather we will restrict ourselves to considering the first 
nontrivial case, that is, we will replace by 2 x 2 density matrices. Such matrices can 
be written in the form, 

p=-{^i" r'^V (1-4) 
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where, in order to guarantee nonnegative definiteness, the points {x, y, z) must he within 
the unit baU ("Bloch sphere" ll^l); x^ + y'^ + z^ < 1. (The points on the bounding spher- 
ical surface, x"^ + y'^ + z'^ = 1, corresponding to the pure states, will be shown to exhibit 
nongeneric behavior, see ( p^.39D and the respective comments in Sec. ^ (cf. I^^j).) Such 
2x2 density matrices correspond, in a one-to-one fashion, to the standard (complex) 
two-level quantum systems — notably, those of spin-1/2 (electrons, protons, . . . ) and 
massless spin-1 particles (photons). These systems carry the basic units of quantum 
computing, the quantum bits. (If we set x = y = in (|1.4]), we recover a classical 
binomial distribution, with the probability of "success", say, being (1 + z)/2 and of 
"failure", (1 — z)/2. Setting either a; or |/ to zero, puts us in the framework of real — 
as opposed to complex — quantum mechanics.) 

The quantum analogue of the product of (classical) probability distributions is the 
tensor product of density matrices. (Again, it is easily seen that, for diagonal matrices, 
this reduces to the classical product.) Hence, we will replace Pg by the tensor products 

n 

(8>p, where p is a 2 x 2 density matrix (|1.4| ). These tensor products are 2" x 2" matrices, 
and can be used to compute {via the fundamental rule that the expected value of an 
observable is the trace of the matrix product of the observable and the density matrix; 
see |B9|) the probability of strings of quantum bits of length n. 



In |]50| , |5T| it was argued that the quantum Fisher information matrix (requiring 



due to noncommutativity — the computation of symmetric logarithmic derivatives 



1 

3;2 _ y2 _ -;2-] 



[2|q) for the density matrices ( |1.4| ) should be taken to be of the form 

'l — — xy xz \ 

xy \ — x^ — z^ yz . (1-5) 

xz yz \ — x^ — y^ j 

The quantum counterpart of the Jeffreys' prior was, then, taken to be the normalized 
form (dividing by tt^) of the square root of the determinant of ( p.. 51) , that is, 

(l_a;2_^2_^2)-l/2/^2^ (1.6) 

On the basis of the above-mentioned result of Clarke and Barron that the Jeffreys' 
prior yields the asymptotic common minimax and maximin redundancy, it was conjec- 
tured 1^ that its assumed quantum counterpart (|1.6|) would have similar properties, 
as well. 



To examine this possibility, (|1.6| ) was embedded as a specific member {u = .5) of a 
one-parameter family of spherically-symmetric/unitarily- invariant probability densities 
(i.e., under unitary transformations of p, the assigned probability is invariant), 

r(5/2-M) , , 

qu = qu{x, y, z) := , — ^ ^ -oo < m < 1. 1.7 

71-^/^ i (1 — m) (1 — — — z'^)^ 



^Following in order to derive ( |1.5D , one must find the symmetric logarithmic derivatives 

{L^,Ly,L^) satisfying 

dp 



and then compute the entries of (1.5) in the form Ml eqs. (2), (3)] 



-f/37 = TT[p{LpL^ + L^Lp)/2] , f3,j^x,y,z. 

For a well-motivated discussion of these formulas and the manner in which classical and quantum 
Fisher information are related, see [P4| . 
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(Embeddings of ( |1.6| ) in other (possibly, multiparameter) families are, of course, possible 
and may be pursued in further research. In this regard, see Theorem ^ in Sec. |^.) For 
M = 0, we obtain a uniform distribution over the unit ball. (This has been used as 
a prior over the two-level quantum systems, at least, in one study |5^.) For 
the uniform distribution over the spherical boundary (the locus of the pure states) is 
approached. (This is often employed as a prior, for example |^.) For u — oo. 



a Dirac distribution concentrated at the origin (corresponding to the fully mixed state) 
is approached. 

For a treatment in our setting that is analogous to that of Clarke and Barron, we 

n 

average 0p with respect to Qu- Doing so yields a one-parameter family of 2" x 2" 
Bayesian density matrices [T3, [TO, 



C„(m) = / [® p)qu{x,y,z)dxdydz, 

— OO < M < 1, which are the analogues of the mixtures M^, and which exhibit highly 
interesting properties. 

Now, still following Clarke and Barron, we have to compute the analogue of the risk 

n 

D{Pq\\M'!^), i.e., the relative entropy S{®p, Cniu))- Keeping the definition (|L^) in mind, 
this requires us to explicitly find the eigenvalues and eigenvectors of the matrices Cn(w), 
which we do in Sec. Subsequently, in Sec. we determine exphcitly the relative 

n 

entropy of (8>p with respect to Cn{u)- We do this by using identities for hypergeometric 
series and some combinatorics. (It is also possible to obtain some of our results by 
making use of representation theory of SU{2). An even more general result was derived 
by combining these two approaches. We comment on this issue at the end of Sec. 0.) 
On the basis of these results, we then address the question of finding asymptotic 



estimations in Sec. and p.5| . These, in turn, form the basis of examining to what 
degree the results of Clarke and Barron are capable of extension to the quantum domain. 
Let us (naively) attempt to apply the formulas of Clarke and Barron |T^, ^ — ( |1.1[ ) 



and ( |1.2| ) above — to the quantum context under investigation here. We do this by 
setting d to 3 (the dimensionality of the unit ball — which we take as K), det I{6) 
to (1 — — — z'^)^^ (the determinant of the quantum Fisher information matrix 
( |I.5[ )), so that Jj^ a/ det I (6) dO is tt^, and w{9) to gu(x, z). Then, we obtain from the 
expression for the asymptotic redundancy (|1.1|), 

^(logn - log 2 - 1) - Q - log(l - r^) + logr(l - u) - logF Q - + o(l), 

(1.8) 

where r = \J + y'^ + z"^ , and from the expression for the asymptotic minimax redun- 
dancy (pD, 

^(logn-log2-l) + ^log7r + o(l). (1.9) 

We shall (in Sec. ^ compare these two formulas, (|L8|) and ( |1.9|) , with the results of 
Sec. D and find some striking similarities and coincidences, particularly associated with 
the fully mixed state (r = 0). These findings will help to support the working hypothesis 
of this study — that there are meaningful extensions to the quantum domain of the 
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(commutative probabilistic) theorems of Clarke and Barron. However, we find that the 
minimax and maximin properties of the Jeffreys' prior do not strictly carry over, but 
transfer only in an approximate sense, which is, nevertheless, still quite remarkable. In 
any case, we can not formally rule out the possibility that the actual global (perhaps 
common) minimax and maximin are achieved for probability distributions not belonging 
to the one-parameter family 

In analogy to Jl^, Sec. 5.2], the matrices Cn{u) should prove useful for the universal 
version of Schumacher data compression []^, |T^ Schumacher's result |5D[ 

must be considered as the quantum analogue of Shannon's noiseless coding theorem (see 
e.g. [^, Sec. 5.6]). Roughly, quantum data compression, as proposed by Schumacher 



47|1 , works as follows: A (quantum) signal source ("sender") generates signal states of 



a quantum system M, the ensemble of possible signals being described by a density 
operator ijj. The signals are projected down to a "dominant" subspace of M, the rest 
is discarded. The information in this dominant subspace is transmitted through a 
(quantum) channel. The receiver tries to reconstruct the original signal by replacing 
the discarded information by some "typical" state. The quality (or faithfulness) of a 
coding scheme is measured by the fidelity, which is by definition the overall probability 
that a signal from the signal ensemble M that is transmitted to the receiver passes a 



validation test comparing it to its original (see Sec. IV]). What Schumacher shows 
is that, for each e > and 6 > 0, under the above coding scheme a compression rate of 
S{ip) + 6 qubits per signal is possible, where Slip) is the von Neumann entropy of ip, 

S{tp) = -Trip log ^, (1.10) 

at a fidelity of at least l — 2e. (Thus, the von Neumann entropy is the quantum analogue 
of the Shannon entropy, which features in Shannon's classical noiseless coding theorem. 
Indeed, as is easy to see, for diagonal matrices, corresponding to classical probability 
densities, the right-hand side of ( |1.1(J| ) reduces to the Shannon entropy.) This is achieved 
by choosing as the dominant subspace that subspace of the quantum system M which 
is the span of the eigenvectors of ip corresponding to the largest eigenvalues, with the 
property that the eigenvalues add up to at least 1 — e. 

Consequently, in a universal compression scheme, we propose to project blocks of n 
signals (qubits) onto those "typical" subspaces of 2"-dimensional Hilbert space corre- 
sponding to as many of the dominant eigenvalues of Cn{u) as it takes to exceed a sum 
1 — e. For all u, the leading one of the + 1 distinct eigenvalues has multiplicity 
n + 1, and belongs to the [n + l)-dimensional (Bose-Einstein) symmetric subspace 0. 
(Projection onto the symmetric subspace has been proposed as a method for stabiliz- 
ing quantum computations, including quantum state storage [Q.) For u = 1/2, the 
leading eigenvalue can be obtained by dividing the [n + l)-st Catalan number — that 
n+2{^^n+P) — (The Catalan numbers "are probably the most frequently 

occurring combinatorial numbers after the binomial coefficients" ^^■) 

Let us point out to the reader the quite recent important work of Petz and Sudar 
42|. They demonstrated that in the quantum case — in contrast to the classical 



situation in which there is, as originally shown by Chentsov [jlll, essentially only one 
monotone metric and, therefore, essentially only one form of the Fisher information 
— there exists an infinitude of such metrics. "The monotonicity of the Riemannian 
metric g is crucial when one likes to imitate the geometrical approach of [Chentsov]. 
An infinitesimal statistical distance has to be monotone under stochastic mappings. We 
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note that the monotonicity of (7 is a strengthening of the concavity of the von Neumann 
entropy. Indeed, positive definiteness of g is equivalent to the strict concavity of the 
von Neumann entropy . . . and monotonicity is much more than positivity" . 

The monotone metrics on the space of density matrices are given by the operator 
monotone functions f{t) : IR+ M+, such that /(I) = 1 and f{t) = tf {l/t). For the 
choice / = (1 + t)/2, one obtains the minimal metric (of the symmetric logarithmic 
derivative), which serves as the basis of our analysis here. "In accordance with the work 
of Braunstein and Caves, this seems to be the canonical metric of parameter estimation 
theory. However, expectation values of certain relevant observables are known to lead to 
statistical inference theory provided by the maximum entropy principle or the minimum 
relative entropy principle when a priori information on the state is available. The best 
prediction is a kind of generalized Gibbs state. On the manifold of those states, the 
differentiation of the entropy functional yields the Kubo-Mori/Bogoliubov metric, which 
is different from the metric of the symmetric logarithmic derivative. Therefore, more 
than one privileged metric shows up in quantum mechanics. The exact clarification of 
this point requires and is worth further studies" . It remains a possibility, then, that 
a monotone metric other than the minimal one (which corresponds to go.s, that is ( |1.6|) ) 
may yield a common global asymptotic minimax and maximin redundancy, thus, fully 
paralleling the classical/nonquantum results of Clarke and Barron [T^, |l7|, |18[. We 
intend to investigate such a possibihty, in particular, for the Kubo-Mori/Bogoliubov 
metric 10, P, |1. 



2. Analysis of a One-Parameter Family of Bayesian Density Matrices 

In this section, we implement the analytical approach described in the Introduction 
to extending the work of Clarke and Barron [|l^, |l^] to the realm of quantum mechanics, 
specifically, the two-level systems. Such systems are representable by density matrices 



p of the form ( |1.4D . A composite system of n independent (unentangled) and identical 

n 

two-level quantum systems is, then, represented by the n-fold tensor product ®p. In 

n 

Theorem |1] of Sec. we average ®p with respect to the one-parameter family of 
probability densities defined in ( |1.7| ), obtaining the Bayesian density matrices Cn(w) 
and formulas for their 2^" entries. Then, in Theorem ^ of Sec. we are able to 
explicitly determine the 2" eigenvalues and eigenvectors of Cn(w). Using these results, 

n 

in Sec. p73| , we compute the relative entropy of ®p with respect to C,n{u). Then, in 
Sec. 2A, we obtain the asymptotics of this relative entropy for n 00. In Sec. p.5| , we 
compute the asymptotics of the von Neumann entropy (see ( |1.10| )) of Cn{u). All these 
results will enable us, in Sec. |], to ascertain to what extent the results of Clarke and 
Barron could be said to carry over to the quantum domain. 



2.1. Entries of the Bayesian density matrices Cn(w). The n-fold tensor product 

n n 

®p is a 2*^ X 2"' matrix. To refer to specific rows and columns of ®p, we index them by 
subsets of the n-element set {1, 2, ... , n}. We choose to employ this notation instead of 
the more familiar use of binary strings, in order to have a more succinct way of writing 
our formulas. For convenience, we will subsequently write [n] for {1,2, . . . Thus, 
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(8>p can be written in the form 

n 

where 

Rij = -^(1 + zr^^il - zY^t^x + lyY^t.^x - lyT^t^ (2.1) 

with ngg denoting the number of elements of [n] contained in both J and J, n^^ denoting 
the number of elements not in both J and J, n^g denoting the number of elements not 
in J but in J, and denoting the number of elements in 1 but not in J. In symbols, 

= |/ n J|, 
n^^ = |M\(/UJ)|, 

n^i = \I\J\. 

n 

We consider the average C,n{u) of ®p with respect to the probability density Qu = 
qu{x, y, z) defined in (|1.7| ) taken over the unit sphere {(x, : + y'^ + z"^ < 1}. This 
average can be described explicitly as follows. 

Theorem 1. The average C,n{u), 

f / " \ 

/ [® p) qu{x,y, z) dx dy dz, 

equals the matrix (Z/j)/ where 

1 r(|-^)r(2 + f + iif - Y-^) r(2 + t + Y-^-^) .22) 
2" r(| + f-..)r(2 + f-t.)r(2 + f-i^-^-t.) ' 

Here, Sij denotes the Kronecker delta, 6ij = 1 if i = j and 6ij = otherwise. 

Remark. It is important for later considerations to observe that because of the term 
Sn^^,n^^ in (|27^ ) the entry Zjj is nonzero if and only if the sets / and J have the same 
cardinality. If / and J have the same cardinality, c say, then Zjj only depends on n^g, 
the number of common elements of / and J, since in this case n^^ is expressible as 
n — 2c + rigg. 

Proof of Theorem |1|. To compute Zu, we have to compute the integral 

/ Rijqu{x,y,z)dxdydz. (2.3) 

For convenience, we treat the case that rigg > n^^ and ri^g > rig^. The other four 
cases are treated similarly. 
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First, we rewrite the matrix entries Rij, 

— (1 + zY^^{l - zf^t (x + iy^i^ {x - iyY^i 
2" 



2" 



=- y (-1' 

On / -I ^ 



2" ^ ' ' V ?■ / V k \ I 

j,k,l>0 



^2i+fc(^^2 ^ ^2^)n^g^ne^-n^e-«^«. (2.4) 



Of course, in order to compute the integral ( |2.3| ), we transform the Cartesian coordinates 
into polar coordinates, 

X = r sin d cos 
y = r sin -(9 sin 
2; = r cos 
< < 27r, < ^9 < vr. 

Thus, using (|2.4|) , the integral (|2.3|) is transformed into 

^.£i"rr'->'"'-'>'(T)("% 

. ^2i+fc+n^g+ng^+2 (^coS^^'+'' 7?) (sin"*e+«6^+l 

■ (cos"^^-"^-' ^) (sin' ^) ,3/2r[f/'^(f_,2). (2-5) 

To evaluate this triple integral we use the following standard formulas: 
.2Mo 2^0,0 (2M-l)!!(2iV-l)!! 

sm ^cos ^rf^ = vr , (2.6a) 

sm--^^.cos--.c^.^2^,^f/(^^-\);; (2.6b) 

(2M + 2A^ + 1)!! ^ ' 

and / sin2^^+^ ^9 cos^^ ^9 dt9 = 0, (2.6c) 
Jo 

[ sin2^79cos2^+^^9rf79 = 0, (2.6d) 
Jo 

sin2M+i ^ cQg2JV+i ^ = 0, (2.6e) 
for any nonnegative integers M and A^. Furthermore, we need the beta integral 



dr = ^ \ ^. 2.7 



;i-r2)« 2r(^-M) 



Now we consider the integral over (p in (|2.5|) . Using (|2.6c| ) and (|2.6d|) , we see that 
each summand in ( p.5|) vanishes if n^g has a parity different from n^^. On the other 
hand, if n^g has the same parity as n^^, then we can evaluate the integrals over ip using 



10 CHRISTIAN KRATTENTHALER AND PAUL B. SLATER 

( p.6a| ) and ( p.6e| ). Discarding for a moment the terms independent of and /, we have 



27r 



/ 



(cos"e*-"^6-' y}) (sin'v^) dif 



n^<t - 2^ (2[^ 1)!! [n^i - n^e - 2/ - 1)!! 



27r 



2/ 



ng^ - r^^g - 1 )!! 



111 



(ng^ - n^g)!! 



27r 5„ 



the last hne being due to the binomial theorem. These considerations reduce (|2.5| ) to 

•1 /.TT 



j,k>0 




(-1) 



^0 



J 



A; 



■ r 



dr. 



2,+fc+2n,g+2 (cos^.+fc^) (sin2"^.+i^) 21(5/2 u) 

V M ^ 7rV2r(l -m) (1 -r2)« 

Using ( |2.6c[ ), ( |2.6e| ) and ( p.7| ) this can be further simplified to 

A 1 V r-1 V r^"^^^ r^"^^ ~ ""^^^ 2(2j + 2A:-l)!!(2n^g)!! 

'^n.^n,, 2„ ^Z^^l H J ; V 2A: ; (2j + 2A: + 2n^g + 1)!! 

r(j + A; + n^g + 3/2) r(l - u) 2 r(5/2 - u) 
' 2T{j + k + n^^ + 5/2- u) 7r^/^T{l-u) 

Next we interchange sums over j and k and write the sum over k in terms of the 
standard hypergeometric notation 



■ (2.8) 



bi,...,bs' 



(«l)fc ■ ■ ■ («r)fc ^fc 



fc=0 



A;! (&i)fc ■ ■ ■ (&s)fe 



where the shifted factorial {a)k is given by (a)^ := a{a + 1) ■ ■ ■ {a + k — 1) , k > 1, 
(a)o := 1. Thus we can write (p78|) in the form 



1 ^ /^^^^^ - n^e\ (2A: - 1)!! n^g! T (| - 



fc>0 



2k y 2^=+! r (I + + n^g - m) 



L2 



\ + k, — Tigg 
+ A; + nrfg - u' 



1 



■ (2.9) 



The 2F1 series can be summed by means of GauB' 2-^1 summation (see e.g. (1.7.6); 
Appendix (III.3)]) 



2F1 



a, b 



T{c)T(c-a-b) 



(2.10) 



r(c-a)r(c-6)' 

provided the series terminates or Re(c — a — 6) > 0. Applying (|2.1CI|) to the 2F1 in (pD 
(observe that it is terminating) and writing the sum over k as a hypergeometric series, 



"ee _ 1 _|_ "ee _ "gg 

2 2'2~'~2 2-]^ 
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the expression ( |2.9| ) becomes 

^ 1 r(2 + rigg + ra^g - m) r (I - u) n^g! 

2^r (I + rigg + - u) r(2 + - u) 

Another apphcation of (|2.10|) gives 
A 1 

r(2 + rigg + n^g - m) r(2 + n^^ + n^g - m) T (| - m) n^g! 
^ r (I + ^ + ^ + n^g - m) r (2 + ^ + ^ + n^g - m) r(2 + n^g - m) ■ 

(2.11) 

Trivially, we have n = rigg + n^^ + n^g + n<=^. Since (|2.11|) vanishes unless ra^g = rig^, 
we can substitute (n — rigg — n^^)/2 for n^g in the arguments of the gamma functions. 
Thus, we see that ( |2.11D equals ( |2.2| ). This completes the proof of the Theorem. □ 

2.2. Eigenvalues and eigenvectors of the Bayesian density matrices Cniu)- 

n 

With the explicit description of the result Cn{u) of averaging (8)p with respect to Qu at 
our disposal, we now proceed to describe the eigenvalues and eigenspaces of (n{u). The 
eigenvalues are given in Theorem |^. Lemma § gives a complete set of eigenvectors of 
Cn{u)- The reader should note that, though complete, this is simply a set of linearly 
independent eigenvectors and not a fully orthogonal set. 

Theorem 2. The eigenvalues of the 2"x2" matrix Qniu) , the entries of which are given 
by (PD, are 



^ 1 r (I - m) r(2 + n - /i - m) r(i + /i - m) ^_q^ 
r(| + f -m) r(2 + f -M)r(i-M) ' ~ ' ' 

with respective multiplicities 

{n-2h + iy fn + 1 



(2.12) 



. , (2.13) 

The Theorem will follow from a sequence of Lemmas. We state the Lemmas first, 
then prove Theorem ^ assuming the truth of the Lemmas, and after that provide proofs 
of the Lemmas. 

In the first Lemma some eigenvectors of the matrix (n{u) are described. Clearly, 
since Cn{u) is a 2" x 2" matrix, the eigenvectors are in 2"-dimensional space. As we did 
previously, we index coordinates by subsets of [n], so that a generic vector is {xs)se[n]- 
In particular, given a subset T of [n], the symbol e-r denotes the standard unit vector 
with a 1 in the T-th coordinate and elsewhere, i.e., ct = {Ss,T)s€in]- 

Now let h, s be integers with 0<h<s<n — h and let A and B be two disjoint 
/i-element subsets A and B of [n]. Then we define the vector Vh,s{A, B) by 

VhM.B):= Yl (-l)""exux'uy, (2.14) 

XCA 

YC[n]\{AUB), \Y\=s-h 
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where X' is the complement of X in B" by which we mean that if X consists of the 
ii-, 12-1 ■ ■ ■ -largest elements oi A, ii < 12 < ■ ■ ■ ., then X' consists of all elements of B 
except for the ii-, 12--, ■ ■ ■ -largest elements of B. For example, let n = 7. Then the 
vector t>2,3({l, 3}, {2, 5}) is given by 

e{2,4,5} + e{2,5,6} + e{2,5,7} - e{l,4,5} - e{l,5,6} - 6(1,5,7} 

- 6(2,3,4} - 6(2,3,6} " 6(2,3,7} + 6(1,3,4} + 6(1,3,6} + 6(1,3,7}- (2-15) 

(In this special case, the possible subsets X of A = {1,3} in the sum in (|2.14|) are 
0, {1}, {3}, {1,3}, with corresponding complements in B = {2,5} being {2,5}, {5}, 
{2}, 0, respectively, and the possible sets Y are {4}, {6}, {7}.) Observe that all sets 
X U X' UY which occur as indices in ( |2.14|) have the same cardinality s. 



Lemma 3. Let h, s be integers with 0<h<s<n — h and let A and B be disjoint 
h-element subsets of [n]. Then Vh,s{A,B) as defined in ( p.l4| ) is an eigenvector of the 
matrix Cn{u), the entries of which are given by ( |2.2| ), for the eigenvalue \h, where is 
given by ( |2.12| ). 



We want to show that the multiphcity of equals the expression in (|2.13| ). Of 
course. Lemma ^ gives many more eigenvectors for A^. Therefore, in order to describe 
a basis for the corresponding eigenspace, we have to restrict the collection of vectors in 
Lemma ^ 

We do this in the following way. Fix h, < h < \n/2\. Let P be a lattice path in 
the plane integer lattice Z^, starting in (0, 0), consisting oi n — h up-steps (1, 1) and h 
down-steps (1,-1), which never goes below the x-axis. Figure 1 displays an example 
with n = 7 and h = 2. Clearly, the end point of P is (n, n — 2h). We call a lattice path 
which starts in (0, 0) and never goes below the x-axes a ballot path. (This terminology 
is motivated by its relation to the (two-candidate) ballot problem, see e.g. ^7\, Ch. 1, 
Sec. 1]. An alternative term for ballot path which is often used is "Dyck path", see 
e.g. p. 1-12].) We will use the abbreviation "b.p." for "ballot path" in displayed 
formulas. 




Ballot paths 
Figure 1 

Given such a lattice path P, label the steps from 1 to n, as is indicated in Figure 1. 
Then define Ap to be set of all labels corresponding to the first h up-steps of P and 
Bp to be set of all labels corresponding to the h down-steps of P. In the example of 
Figure 1 we have for the choice h = 2 that Ap = {1, 3} and Bp = {2, 5}. Thus, to each 
h and s, < h < s < n — h, and P as above we can associate the vector Vh,s{Ap, Bp). In 
our running example of Figure 1 the vector f2,3(-P) would hence be W2,3({1, 3}, {2, 5}), 
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the vector in ( |2.15| ). To have a more concise form of notation, we will write Vh,s{P) for 
Vh,s{^Pi Bp) from now on. 

Lemma 4. The set of vectors 

{vh,s{P) : < h < s < n - h, Pa ballot path from (0, 0) to {n, n - 2h)} (2.16) 

is linearly independent. 

The final Lemma tells us how many such vectors Vh,s{P) there are. 

Lemma 5. The number of ballot paths from (0,0) to (n,n — 2h) is ""^^^^ ("fe ^) • -^^^ 
total number of all vectors in the set (|2.16|) is 2". 

Now, let us for a moment assume that Lemmas ^|-^ are already proved. Then, 
Theorem |^ follows immediately, as it turns out. 

Proof of Theorem |2|. Consider the set of vectors in ( p.l6| ). By Lemma ^ we know 
that it consists of eigenvectors for the matrix (n{u). In addition. Lemma ^ tells us that 
this set of vectors is linearly independent. Furthermore, by Lemma ^ the number of 
vectors in this set is exactly 2", which is the dimension of the space where all these 
vectors are contained. Therefore, they must form a basis of the space. 

Lemma ^ says more precisely that Vh,s{P) is an eigenvector for the eigenvalue Xh. 
From what we already know, this implies that for fixed h the set 

{vh,s{P) : h < s < n — h, Pa ballot path from (0, 0) to {n, n — 2h)} 

forms a basis for the eigenspace corresponding to Xh- Therefore, the dimension of the 
eigenspace corresponding to Xh equals the number of possible numbers s times the 
number of possible lattice paths P. This is exactly 

the number of possible lattice paths P being given by the first statement of Lemma ^. 
This expression equals exactly the expression ( ^^.131 ). Thus, Theorem ^ is proved. □ 

Now we turn to the proofs of the Lemmas. 

Proof of Lemma |^. Let /i, s and A, B be fixed, satisfying the restrictions in the 
statement of the Lemma. We have to show that 

Cn{u) ■ VhM^ B) = ^hVhAA B). 
Restricting our attention to the J-th component, we see from the definition ( p.l4|) of 
Vh,s{^^ B) that we need to establish 

'Aft(-l)l^l if / is of the form f/ U f/' U \/ 
for some U and V, U A, 
V C [n]\{AUB), \V\=s-h ^ ■ ' 



XCA 
YC[n]\{ALlB), |y| = 



otherwise. 



We prove ( p.l7| ) by a case by case analysis. The first two cases cover the case "otherwise" 
in ( |2.17| ), the third case treats the first alternative in ( |2.17| ). 

Case 1. The cardinality of I is different from s. As we observed earlier, the cardinality 
of any set X U X' U Y which occurs as index at the left-hand side of ( p.l7| ) equals s. 



14 



CHRISTIAN KRATTENTHALER AND PAUL B. SLATER 



The cardinality of / however is different from s. As we observed in the Remark after 
Theorem |1], this imphes that any coefficient Zj^xux'uy on the left-hand side vanishes. 
Thus, ( |2.17| ) is proved in this case. 

Case 2. The cardinality of I equals s, hut I does not have the form U VJ U' UV for 
any U and V, U <Z A, V <Z [n]\{A U B), \V\ = s - h. Now the sum on the left-hand 
side of (|2.17|) contains nonzero contributions. We have to show that they cancel each 
other. We do this by grouping summands in pairs, the sum of each pair being 0. 

Consider a set X U X' UY which occurs as index at the left-hand side of ( |2.17| ). Let 
e be minimal such that 

either: the e-th largest element of A and the e-th largest element of B are both in 
/, 

or: the e-th largest element of A and the e-th largest element of B are both not 
in /. 

That such an e must exist is guaranteed by our assumptions about /. Now consider X 
and X'. If the e-th largest element of A is contained in X then the e-th largest element 
of B is not contained in X', and vice versa. Define a new set X by adding to X the 
e-th largest element of A if it is not already contained in X, respectively by removing 
it from X if it is contained in X. Then, it is easily checked that 

On the other hand, we have (— l)'"^' = — (— l)'"^' since the cardinalities of X and X 
differ by ±1. Both facts combined give 

Zi,xux'uY (-1)"" + ^/.xux'uy (-1)"^' = 0- 

Hence, we have found two summands on the left-hand side of ( |2.17D which cancel each 
other. 

Summarizing, this construction finds for any X, Y sets X, Y such that the corre- 
sponding summands on the left-hand side of ( ^.17] ) cancel each other. Moreover, this 
construction applied to X, Y gives back X, Y. Hence, what the construction does is 
exactly what we claimed, namely it groups the summands into pairs which contribute 
to the whole sum. Therefore the sum is 0, which establishes ( p.l7| ) in this case also. 

Case 3. I has the form U U U' U V for some U and V , U C A, V C [n]\{A U B), 
\V\ = s — h. This assumption implies in particular that the cardinality of / is s. From 
the Remark after the statement of Theorem |I| we know that in our situation Z j xux'uy 
depends only on the number of common elements in / and X U X' U Y . Thus, the 
left-hand side in ( |2.17| ) reduces to 

AT j,A; -1 l^l+^fc! )^ ^—r . ^-^^ 2.18 

,,,>o 2"r(| + |-n)r(2 + f-^)r(2 + fc-n)' 

where N{j, k) is the number of sets X U X' U Y , for some X and Y, X ^ A, Y (1 
[n]\{A[J B), \Y\ = s — h, which have s — k elements in common with /, and which have 
h — j elements in common with I (1 {AU B) = UUU'. Clearly, we used expression ( |2.2[ ) 
with rzee = s — k and n^^ = n — s — k. 

To determine N{j, k), note first that there are (^) possible sets XUX' which intersect 
UUU' in exactly h—j elements. Next, let us assume that we already made a choice for 
X U X'. In order to determine the number of possible sets Y such that X U X' UY has 



ASYMPTOTIC REDUNDANCIES 



15 



s — k elements in common with /, we have to choose {s — k) — {h — j) = s — h + j — k 
elements from V, for which we have possibilities, and we have to choose 

s — h — {s — h + j — k) = k — j elements from [n]\(J U A U B) to obtain a total number 
of s elements, for which we have ("^1"'*) possibilities. Hence, 



Nij,k) 



s — h 
k-j 



n — s — h 
k-j 



(2.19) 



So it remains to evaluate the double sum (|2.18|) , using the expression (|2.19|) for 

N{j,k). 

We start by writing the sum over j in (|2.18|) in hypergeometric notation. 



^_l^\u\_^ r(| -u)T{2 + n-s~u)T{2 + s-u) 



2" r(2-M)r(2 + f -u)r(| + f 

{h - s)k {h-n + s)k 



u 



E 

A:=0 



-3-^2 



1 — h — k + s,l — h — k + n 



To the 3F2 series we apply a transformation formula of Thomae (see e.g. p^, (3.1.1)]), 



3-^2 



a, b, —m ^ 

d, e ' 



-b + e)„ 



-3-^2 



-m, b, —a + d 



dA + b — e 



m 



(2.20) 



where m is a nonnegative integer. We write the resulting 3F2 again as a sum over j, then 
interchange sums over k and j, and write the (now) inner sum over k in hypergeometric 
notation. Thus we obtain 



1 r(| - u) r(2 + n - s - u) r(2 + s - u) 
¥ r(| + f -M)r(2 + f -M)r(2-M) 



j=0 



[~h)j {l-h + s) 
(1), (2 - u), 



j — n + s, h 



2 + J 



;1 



u 



The 2-^1 series in this expression is terminating because h — s is a nonpositive integer. 
Hence, it can be summed by means of Gaufi' sum ( |2.10| ). Writing the remaining sum 
over j in hypergeometric notation, the above expression becomes 



-1 



1 



r(| 



u)r{2 + n - h ~ u)r{2 + s 



u 



—h, 1 — h + s 
2-h + s 



u 



2" r(| + f - m) r(2 + f - m) r(2 + s - /i - m) 

Again, the 2-F1 series is terminating and so is summable by means of ( p.lOp . Thus, we 
get 

I ^.,\u\l T{l-u)V{2 + n-h~u)V{l + h-u) 



V > 2" r(| + f r(2 + f -M)r(i-M) ' 

which is exactly the expression ( pl2|) for \h times (-l)'^'. This proves ( |2.17] ) in this 



case. 



The proof of Lemma ^ is now complete. 



□ 



Proof of Lemma ^. We know from Lemma ^ that Vh,s{P) lies in the eigenspace 
for the eigenvalue A^, with \h being given in (|2.12|) . The X^s, h = 0,1,..., [n/2\, 
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are all distinct, so the corresponding eigenspaces are linearly independent. Therefore it 
suffices to show that for any fixed h the set of vectors 

{vh,s{P) : h < s < n — h, Pa ballot path from (0, 0) to (n, n — 2h)} 

is linearly independent. 

On the other hand, a vector Vh,s{A, B) lies in the space spanned by the standard unit 
vectors ex with |T| = s. Clearly, as s varies, these spaces are linearly independent. 
Therefore, it suffices to show that for any fixed h and s the set of vectors 

{vh,s{P) '■ P a ballot path from (0, 0) to (n, n — 2h)} 

is linearly independent. 

So, let us fix integers h and s with 0<h<s<n — h, and let us suppose that there 
is some vanishing linear combination 

J2 cpVhAP) = 0- (2.21) 

P b.p. from (0,0) to {n,n-2h) 

We have to establish that cp = for all ballot paths P from (0, 0) to (n, n — 2h). 

We prove this fact by induction on the set of ballot paths from (0, 0) to (n, n — 2h). 
In order to make this more precise, we need to impose a certain order on the ballot 
paths. Given a ballot path P from (0, 0) to (n, n — 2h), we define its front portion Fp 
to be the portion of P from the beginning up to and including P's h-th up-step. For 
example, choosing h = 2, the front portion of the ballot path in Figure 1 is the subpath 
from (0,0) to (3,1). Note that Fp can be any ballot path starting in (0,0) with h 
up-steps and less than h down-steps. We order such front portions lexicographically, in 
the sense that Fi is before F2 if and only if Fi and F2 agree up to some point and then 
Fi continues with an up-step while F2 continues with a down-step. 

Now, here is what we are going to prove: Fix any possible front portion F. We shall 
show that Cp = for all P with front portion Fp equal to F, given that it is already 
known that cp/ = for all P' with a front portion Fpi that is before F. Clearly, by 
induction, this would prove Cp = for all ballot paths P from (0, 0) to (n, n — 2h). 

Let F be a possible front portion, i.e., a ballot path starting in (0, 0) with exactly h 
up-steps and less than h down-steps. As we did earlier, label the steps of F by 1,2,..., 
and denote the set of labels corresponding to the down-steps of F by Bp. We write b 
for \Bp\, the number of all down-steps of F. Observe that then the total number of 
steps of F is h + b. 

Now, let T be a fixed (/i— 6)-element subset of {h+b+1, h+b+2, . . . , n}. Furthermore, 
let S" be a set of the form S = Bp U Si U ^2, where Si CT and 5*2 C {h + b+l,h + b + 
2, . . . , n}\T, and such that \S\ = s. 

We consider the coefficient of 65 in the left-hand side of ( [2.211 ). To determine this 
coefficient, we have to determine the coefficient of 65 in Vh,s{P), for all P. We may 
concentrate on those P whose front portion Fp is equal to or later than F, since our 
induction hypothesis says that cp = for all P with Fp before F. So, let P be a 
ballot path from (0, 0) to {n, n — 2h) with front portion equal to or later than F. We 
claim that the coefficient of es in Vh,s{P) is zero unless the set Bp of down-steps of P 
is contained in 5*. 

Let the coefficient of 65 in Vh^P) be nonzero. To establish the claim, we ffist prove 
that the front portion Fp of P has to equal F. Suppose that this is not the case. Then 
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the front portion of P runs in parallel with F for some time, say for the first (m — 1) 
steps, with some m < h + b, and then F continues with an up-step and Fp continues 
with a down-step (recall that Fp is equal to or later than F). By ( p.l4| ) we have 



VhAP)-= (-l)""exux'uy. (2.22) 



XCAp 
YC[n]\{ApUBp), \Y\-- 



We are assuming that the coefficient of in Vh,s{P) is nonzero, therefore S must be 
of the form S = X U X' U Y , with X, Y as described in ( 2.22|) . We are considering the 



case that the m-th step of Fp is a down-step, whence m e Bp, while the m-th step of 
F is an up-step, whence m ^ Bp. By definition of S, we have Sn{l,2 . . . ,h + b} = Bp, 
whence m ^ S. 

Summarizing so far, we have m G Bp, m ^ S, for some m < h+b, and S = XUX'UY , 
for some X, Y as described in ( p.22| ). In particular we have m ^ X'. Now recall that X' 
is the "complement of X in Bp" . This says in particular that, if m is the i-th largest 
element in Bp, then the i-th largest element of Ap, a say, is an element of X, and so 
of S. By construction of and Bp, a is smaller than m, so in particular a < h + b. 
As we already observed, there holds S H {1,2, . . . , h + b} = Bp, so we have a G Bp, 
i.e., the a-th step of F is a down-step. On the other hand, we assumed that P and F 
run in parallel for the first (m — 1) steps. Since a G Ap, the set of up-steps of P, the 
a-th step of P is an up-step. We have a < m — 1, therefore the a-th step of F must be 
an up-step also. This is absurd. Therefore, given that the coefficient of es in Vh^s{P) is 
nonzero, the front portion Fp of P has to equal F. 

Now, let P be a ballot path from (0, 0) to {n, n — 2h) with front portion equal to 
F, and suppose that S has the form S = X U X' U Y , for some X, Y as described in 
( p.22| ). By definition of the front portion, the set of up-steps of P has the property 



Ap f] {1,2, . . . ,h + b} = {1,2, . . . ,h + b}\BF. Since \Bf\ = b, these are the labels of 
exactly h up-steps. Since the cardinality of Ap is exactly h by definition, we must have 
Ap = {1,2, . . . , h + b}\Bp. Because of S H {1,2, . . . , h + b} = Bp, which we already 
used a number of times, Ap and S are disjoint, which in particular implies that Ap and 
X are disjoint. However, X is a subset of Ap by definition, so X must be empty. This 
in turn implies that X' = Bp. This says nothing else but that the set Bp oi down-steps 
of P equals X' and so is contained in S. This establishes our claim. 

In fact, we proved more. We saw that 5* has the form S = X U X' UY , with X = 0. 
This implies that the coefficient of es in Vh,s{P), as given by ( |2.22| ), is actually +1. 
Comparison of coefficients of in ( p.21|) then gives 

Cp = 0, (2.23) 

P b.p. from (0,0) to {n,n-2h) 
Fp=F, BpCS 

for any 5* = Bp U 5*1 U 5*2, where Si <^T and 5*2 C {h + b + 1, h + b + 2, . . . , n}\T, and 
such that 15*1 = s. 

Now, we sum both sides of ( p.23| ) over all such sets S, keeping the cardinality of 



and 5*2 fixed, say IS*!! = h — b — j, enforcing |S'2| = s — h + j, for a fixed j, < j < h — b. 
For a fixed ballot path P from (0, 0) to {n, n — 2h), with front portion F, with h — b — k 
down-steps in T, and hence with k down-steps in {h + b+l,h + b + 2, . . . , n}\T, there 
are (j^^j) such sets Si containing all the h — b — k down-steps of P in T, and there 
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are (" ''^^^j^j^-'i^^ ^) such sets 5*2 C {/i + 6 + 1, + 6 + 2, . . . , n}\T containing all the k 
down-steps of P in {/;, + 6 + 1, /i + 6 + 2, . . . , n}\T. Therefore, summing up ( 2.23 ) gives 

'k\ ( n — 2h — k 



j J \n — h — s — j 



k>0 ^-^ ^ ^ J / \ p J, p fjojjj (Q 0) to (n,n-2h) 

Fp=F, \Bpr\T\=h-b-k 
\Bpr\{{h+b+l,h+b+2,...,n}\T)\=k 



J2 CP =0, j = 0,l,...,h-b. 



(2.24) 

Denoting the inner sum in ( p.24| ) by C{k), we see that ( |2.24|) represents a non-degenerate 
triangular system of linear equations for C{0),C{1), . . . ,C{h — b). Therefore, all the 
quantities C(0), C(l), . . . ,C{h — b) have to equal 0. In particular, we have C(0) = 0. 
Now, C(0) consists of just a single term cp, with P being the ballot path from (0, 0) to 
(n, n — 2h), with front portion F, and the labels of the h — b down-steps besides those 
of F being exactly the elements of T. Therefore, we have cp = for this ballot path. 
The set T was an arbitrary {h — 6)-subset of {h + b + 1, h + b + 2, . . . , n}. Thus, we 
have proved cp = for any ballot path P from (0, 0) to (n, n — 2h) with front portion 
F. This completes our induction proof. □ 

Proof of Lemma |5|. That the number of ballot paths from (0, 0) to (n, n — 2h) 
equals ""^^^"^ ("t^) ^ classical combinatorial result (see e.g. Theorem 1 with 
t = 1]). From this it follows that the total number of vectors in the set ( p.l6| ) is 

h=o \ ' J \ / 

To evaluate this sum, note that the summand is invariant under the substitution h — >■ 
n — 2h+l. Therefore, extending the range of summation in ( |2.25[ ) to h = 0,1, . . . ,n + l 
and dividing the result by 2 gives the same value. So, the cardinality of the set ( P^.16D 
is also given by 

^n-2h + 1)2 fn + 1 



2^ (n + 1) \ h 

The reader will not have any difficulty in splitting this sum into three parts so that each 
part can be summed by means of the binomial theorem. (Computer algebra systems 
like Maple or Mathematica do this automatically.) The result is exactly 2", as was 
claimed. □ 

In fact. Theorem |^ can be generalized to a wider class of matrices. 
Theorem 6. Let Cn(^) = (^/j)/,JG[n] be the 2" x 2" matrix defined by 

ZlJ := ^ngg.ng^.^ ^ n-nee-n^i ' ^^^^^ ~ ^^^)' 

where n^g, etc., have the same meaning as earlier, and where f{x) is a function of x 
which is symmetric, i.e., f{x) = f{—x). Then, the eigenvalues of (n{u) o'^e 

T(2 + n-h-u)T(l + h-u) , , , 

A,,, = /n-2s — — ^ — -/-^ — -, 0<h<s<n-h, 2.26 

L [2 + n — s — uj L [2 + s — uj L [1 — uj 
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with respective multiplicities 
independent of s. 

Proof. The above proof of Theorem ^ has to be adjusted only insignificantly to 
yield a proof of Theorem In particular, the vector Vh^s{A,B) as defined in ( p.l4|) is 
an eigenvector for Xh^g, for any two disjoint /i-element subsets A and B of [n], and the 
set ( p.l6| ) is a basis of eigenvectors for Cn{u). □ 

n 

2.3. The relative entropies of (g) p with respect to the Bayesian density ma- 
trices Cn{u). We now apply the preceding results to compute the relative entropy 

n n 

S{®pXn{u)) of ®p with respect to Cn{u). Utilizing the definition ( p..3|) of relative en- 

n 

tropy and employing the property ^ that S{®p) = nS{p), it is given by 



-n5(p)-Tr(®p-logCn(M)). (2.28) 



For the first term, for the entropy S{p) of p, p being given by ( |1.4|) , we have, using 
spherical coordinates (r, t?, 0), so that r = (x^ + 2/^ + z'^Y^'^, 

(l-r) (1-r) (1 + r) (1 + r) 
S{p) = -^-^log - ^-^log (2.29) 

Concerning the second term in ( |2.28| ), we have the following theorem. 

Theorem 7. Let (n{u) = {Zij)j^j(z[n] be the matrix with entries Zjj given in 
Then, we have 

/ n 

Tr i^®p- log Cn{u] 

= E (" t + - r)-- - (1 + - rr'-"] log A„ 



n + 1 \ h 2"+V 



(2.30) 



with Xh as given in ( |2.12| ), anc? wi/i r = a/x^ + + 2;^ . 

Before we move on to the proof, we note that Theorem ^ gives us the following 

n 

expression for the relative entropy of (8>p with respect to Cn{u) 

n n 

Corollary 8. The relative entropy S{(E)p,Cn{u)) of ®p with respect to (n{u) equals 
-(l-r) log((l - r)/2) + -(1 + r) log((l + r)/2) 

L"/2J 



_ {n-2h + 1) /-n + 1\ 

((1 + rr-h+i^i _ _ (1 + _ log A,, (2.31) 



h=0 

1 



2"+V 



with Xh as given in (|2.12|) , and with r = ^Jx^^^r^p^^T^ ■ 
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Proof of Theorem |^ One way of determining the trace of a linear operator L is 
to choose a basis of the vector space, {vi : / G [n]} say, write the action of L on the 
basis elements in the form 

Lvj = cjVi + linear combination of f j's, J I, 

and then form the sum cj of the "diagonal" coefficients, which gives exactly the 
trace of L. 

Clearly, we choose as a basis our set ( p.l6| ) of eigenvectors for (n{u)- To determine 

n n 

the action of ®p ■ logCn(^^) we need only to find the action of ®p on the vectors in the 
set ( p.l6|) . We claim that this action can be described as 



■ Vh,s{P) + linear combination of eigenvectors 

Vh',s'{P') with s s, (2.32) 

for any basis vector Vh,s{P) in (|2.16| ). 

. n 

To see this, consider the J-th component of ( (g) pj ■ Vh,s{P), i-e., the coefficient of e/ 

. n 

in ( (g) p) ■ Vh,s{P), I ^ H- By the definition (|2.14| ) of Vh,s{P) it equals 



^/,xux'uy(-l)"", (2.33) 

XCAp 

YC[n]\{ApUBp), \Y\=s-h 



where Rjj denotes the (/, J)-entry of ®p. (Recall that Rjj is given explicitly in ( pTT|) .) 
Now, it should be observed that we did a similar calculation already, namely in the 
proof of Lemma ^. In fact, the expression (|2.33|) is almost identical with the left-hand 
side of ( p.lTI) . The essential difference is that Zjj is replaced by Rjj for all J (the 



nonessential difference is that A, B are replaced by Ap,Bp, respectively). Therefore, 
we can partially rely upon what was done in the proof of Lemma ^. 
We distinguish between the same cases as in the proof of Lemma |^. 

Case 1. The cardinality of I is different from s. We do not have to worry about this 
case, since cj then lies in the span of vectors Vh\s'{P') with s' ^ s, which is taken care 
of in (PI). 

Case 2. The cardinality of I equals s, but I does not have the form UUU'UV for any 
U and V , U ^ Ap, V C [r2]\(y4p U Bp), \V\ = s — h. Essentially the same arguments 



as those in Case 2 in the proof of Lemma § show that the term ( p.33| ) vanishes for this 



choice of /. Of course, one has to use the explicit expression ( ^31) for Rjj. 

Case 3. I has the form UUU'UV for some U and V, U C Ap, V (1 [n]\{Ap U Bp), 
\V\ = s — h. In Case 3 in the proof of Lemma ^ we observed that there are N{j, k) 
sets XUX'UY, for some X and Y, X C Ap,Y C [n]\{Ap U Bp), \Y\ = s - h, which 
have s — k elements in common with /, and which have h — j elements in common with 
/ n {Ap U Bp) = U U U', where N{j, k) is given by ( p.l9|) . Then, using the explicit 
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expression ( pTT| ) for Rjj, it is straightforward to see that the expression ( |2.33| ) equals 
in this case. This estabhshes (|2.32|) . 

n 

Now we are in the position to write down an expression for the trace of ®pTogCn(w). 
By Theorem ^ and by (|2.32|) we have 



)p ■ logCn(M) ) ■ Vh,s{P) 

'h\ / s — h\ fn — s — h 



fc>j>0 



{l + zy~\x' + y'f{l-z) 



n—s—k 



.k-jj\ k- j 
■ logA/i ■ Vh,s{P) + hnear combination of eigenvectors 

Vh',s'{,P') with s' ^ s. (2.34) 

From what was said at the beginning of this proof, in order to obtain the trace of 

®p ■ logCn(M), we have to form the sum of all the "diagonal" coefficients in ( ^.34| ). 
Using the first statement of Lemma ^ and replacing + by — 2;^, we see that it is 



Ln/2J 



(n-2h + 1) + 1\ 1 



h=0 



(n + l) 



h 



n—h 



s=h k>j>0 



h\ / s — h\ / n — s — h 
j)\k-j)\ k-j 



(2.35) 



In order to see that this expression equals ( |2.30| ), we have to prove 

n—h h s 
s=h j=0 k=j 

1 



h\ Z' s — h\ 1' n — s — h 
j)\k~jj\ k-j 



2r 



((1 + r)"+^-^(l - - (1 + r)\l - r)"+^-'^) . (2.36) 



We start with the left-hand side of ( |2.36|) and write the inner sum in hypergeometric 
notation, thus obtaining 

(1 _ zy-'-\i + zy~^ - z^y^-^,F, - ^ Y' ^-^^ 

s=h j=0 ^ L 



1-^2 



To the 2F1 series we apply the transformation formula ( P9| , (1.8.10), terminating form] 

(c - a)r 



2F, 



1 + a — c — m' 



l-z 



where m is a nonnegative integer. We write the resulting 2-^1 series again as a sum over 
k. In the resulting expression we exchange sums so that the sum over j becomes the 
innermost sum. Thus, we obtain 



n—h s—h 



EEli---')'!!- 



\n—s—kf', I \s—k 

z) (l + z) 



s=h k=0 



{h - s)k{n - h - s + l)s-h {h-n + s)k 
{l)k{l)s-h {2h-n)k 



j=0 



j7 V 1 - 
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Clearly, the innermost sum can be evaluated by the binomial theorem. Then, we 
interchange sums over s and k. The expression that results is 

. _ — IT);— 



k=0 

s=o ^ / ^ 

Again, we can apply the binomial theorem. Thus, we reduce our expression on the 
left-hand side of to 

Now, we replace (1 — r"^)^ by its binomial expansion 

Ef=o(-l)^©^^^ interchange sums 
over k and /, and write the (now) inner sum over k in hypergeometric notation. This 
gives 



(1)^(2/^ -n), 



• 2-^1 



7i + /-f,i + /i + /-f ■ 

2h + l-n 



Finally, this 2F1 series can be summed by means of GauB' summation ( |2.1(j| ). Simplify- 
ing, we have 

[n/2\-h 

' ' ' ,2i 



2\h sr^ fn-2h+l\ 

l—n \ / 



which is easily seen to equal the right-hand side in ( |2.36| ). This completes the proof of 
the Theorem. □ 

n 

2.4. Asymptotics of the relative entropy of p with respect to Cn{u). In the 
preceding subsection, we obtained in Corollary ^ the general formula (|2.31| ) for the 

n 

relative entropy of ®p with respect to the Bayesian density matrix Cn{u). We, now, 
proceed to find its asymptotics for n ^ 00. We prove the following theorem. 

n n 

Theorem 9. The asymptotics of the relative entropy S{®p, Cniu)) of ®p with respect 
to Cn{u) for a fixed r = a/x^ + + 2;^ with < r < 1 is given by 

3 13 l/l — r 
- log n log 2 — (1 — -u) log(l — r^) H lo 



2 2 2 ' ' ' ' 2r \l + r 

+ log r(l -u)- log r(5/2 -u) + (^^^ . (2.37) 

In the case r = 0, this means that the asymptotics is given by the expression (|2.37| ) in 
the limit r | 0, i.e., by 

^ log n - ^ - ^ log 2 + log r(l - m) - log r(5/2 - u) + O (^Y (2.38) 
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For any fixed e > 0, the 0{.) term in ( |2.37| ) is uniform in u and r as long as < r < 
1-e. 

For r = 1 the asymptotics is given by 

(2 - u) logn + (2n - 3) log2 + i logvr - logr(5/2 - u) + O ( -] . (2.39) 



2 ° ° ' ' ' 
Also here, the 0{.) term is uniform in u. 

Remark. It is instructive to observe that, although a comparison of (|2.371 ) and ( p.39|) 

n 

seems to suggest that the asymptotics of the relative entropy of ®p with respect to 
C,n{u) behaves completely differently for < r < 1 and r = 1, the two cases are really 
quite compatible. In fact, letting r tend to 1 in ( p. 37] ) shows that (ignoring the error 
term) the asymptotic expression approaches +oo for -u < 1/2, — oo for -u > 1/2, and it 
approaches |logn — \ — flog 2 + | logvr for u = 1/2. This indicates that, for r = 1, 

n 

the order of magnitude of the relative entropy of ®p with respect to (niu) should be 
larger than |logn if m < 1/2, smaller than |logn ii u > 1/2, and exactly |logn if 
u = 1/2. How much larger or smaller is precisely what formula ( |2.39|) tells us: the 
order of magnitude is (2 — u) logn, and in the case u = 1/2 the asymptotics is, in fact, 
f logn - 21og2 + i logvr. 



Sketch of Proof of Theorem |. We have to estimate the expression (|2.31|) for 
large n. Clearly, it suffices to concentrate on the sum in (|2.31|) . Because of Xn+i-h = ^h, 
this sum can be also expressed as 



1 ^ (n-2h + l) fn + l 



^ (n + 1) \ h 



l + r)"-^+^(l -r)MogAft. (2.40) 



For r = 1 this sum reduces to logAo, Aq being given by (|2.12| ). A straightforward 
application of Stirling's formula then leads to (|2.39|) . 

From now on let < r < 1. We recall that A^ is given by ( |2.12| ). Consequently, 
we expand the logarithm in ( |2.40|) according to the addition rule, and split the sum 
( p.40| ) into the corresponding parts. The individual parts can be summed by means 
of the binomial theorem, except for the parts which involve logr(l + h — u). (To be 
precise, they have to be split appropriately before the binomial theorem can be applied. 
Computer algebra systems like Maple or Mathematica do this automatically.) 

In order to handle the terms which contain logr(l + h — u), we use Stirling's formula 



log T{z) = (^^ - 2 J - ^ + 2 log 2 + - log vr + O J . (2.41) 

Again, after splitting, all the resulting sums can be evaluated by means of the binomial 
theorem, except for 

h=0 \ ^ ) \ / 



(2.42) 

The asymptotics of this sum can now easily (if though tediously) be determined by 
making use of a Taylor expansion of log(l + h — u) about n{l — r)/2 (i.e., at l + h — u = 
n{l — r)/2) with sufficiently many terms. 

If everything is put together, the result is (p.37|) . □ 
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2.5. Asymptotics of the von Neumann entropies of the Bayesian density 
matrices Cra(^)- The main result of this section describes the asymptotics of the von 
Neumann entropy ( |1.10| ) of Cn('w). In view of the exphcit description of the eigenvalues 
of Cn{u) and their multiplicities in Theorem 13, this entropy equals 



- 2/l + 1) + A , , , . ^ 

with Xfi being given by ( p.l2|) . 

Theorem 10. The asymptotics of the von Neumann entropy S{(n{u)) of(n{u) is given 
by 

" ( 2(2-Ih1-„) + - 2«) - ^(1 - .)) + I log n + (-1 + 2„) log 2 

14 _ 207/ + 77/2 

^ +iog(r(i-7.))-iog(r(5/2-7.)) 



2 (2 - 7X) (1 - 7X) 



+ (2 - 27x)(7/;(5 -2u)-i;{l-u)) + 0[^\, (2.44) 



where ip{x) is the digamma function, 

':T(x) 



dx 



T x) 



Sketch of Proof. We have to estimate the expression ( |2.43| ) for large n. We 
proceed as in the proof of Theorem ^. First we use the property \n+i-h = to rewrite 
the sum ( ^.43| ) as 

l^i^(7^-2/^ + l)V^ + l^^ , ^ 
-2g (. + 1) ( h )'^''^'- ^'-''^ 



Next, while recalling that Xh is given by ( p. 12 ), we expand the logarithm in ( p. 45 ) 



according to the addition rule, and split the sum ( p.45|) into the corresponding parts. 
Here, the individual parts can be summed by means of Gaufi' 2F1 summation ( p.lO| ), 
except for the parts which involve logr(l + h ~ u). (Again, to be precise, they have 
to be split appropriately before the GauB summation can be applied, which is done 
automatically by computer algebra systems like Maple or Mathematica.) 

To handle the terms which contain logr{l + h — u), we invoke again Stirling's formula 
( p.41| ). After splitting, all the resulting sums can be evaluated by means of GauB' 2-^1 
summation (|2.10|) , except for 

E ^"7'!;|/^' (" I ') A.(l/2 + h-u) log(l + h-u). (2.46) 



Now, to get an asymptotic estimate for this sum, as n tends to infinity, is not as obvious 
as it was for ( p.42|) . The essential "trick" needed was kindly indicated to us by Peter 
Grabner: an asymptotic estimate (in fact, an exact result) for ( p.46|) with log(l + h — u) 
replaced hj + h — u) can be obtained without difficulty (but with some amount of 
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tedious calculation) by starting with the sum 

^ (n + 1) V h 

1 T{5/2-u)T{2 + n-h-u)T{l + a + h-u) ^^ 
'2- T{5/2 + n/2-u)Ti2 + n/2-u)Til-u) ^ « + l^-^O 

evaluating it by applying GauB' 2-F1 summation (|2.1CI| ), differentiating both sides of the 
resulting equation with respect to a, and by finally setting a = 0. Finally one relates 
the result to ( p.46| ) by using the asymptotic expansion tp{z) = log(z) — + O (p-)- 
If everything is put together, the right-hand side of ( p.44|) is obtained. □ 




0.2 0.4 0.6 0.8 1 

Nonclassical/quantum term ((1 — r) log(l — r) — (1 + r) log(l + r))) 
in the quantum asymptotic redundancy ( |2.37| ) 

Figure 2 

3. Comparison of our asymptotic redundancies for the one-parameter 

FAMILY Qu WITH THOSE OF ClARKE AND BARRON 

Let us, first, compare the formula ( p. .11) for the asymptotic redundancy of Clarke and 
Barron to that derived here ( p.3?l ) for the two-level quantum systems, in terms of the 
one-parameter family of probability densities g^, —00 < u < 1, given in (|1.7|). Since the 
unit ball or Bloch sphere of such systems is three-dimensional in nature, we are led to set 
the dimension d of the parameter space in ( p. . 1|) to 3. The quantum Fisher information 
matrix I {6) for that case was taken to be ( |1.5| ), while the role of the probability function 
w{6) is played by Qu- Under these substitutions, it was seen in the Introduction that 
formula ( |1.1| ) reduces to (pTBI). Then, we see that for < r < 1, formulas ( |2.37] ) and ( |1.8[ ) 
coincide except for the presence of the monotonically increasing (nonclassical/quantum) 
term 

log(l + ^ log (^]^^ = ^ ~ ''^ - r) - (1 + r) log(l + r)) 
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(see Figure 2 for a plot of this term — log 2 ^ .693147 "nats" of information equalling 
one "bit") in ( |2.37| ). (This term would have to be replaced by —1 — that is, its limit 
for r ^ — to give ( |1.8| ).) In particular, the order of magnitude, |logn, is precisely 
the same in both formulas. For the particular case r = 0, the asymptotic formula ( |2.37D 
(see ( |2.38| )) precisely coincides with (|1.8|). 

In the case r = 1, however, i.e., when we consider the boundary of the parameter 
space (represented by the unit sphere), the situation is slightly tricky. Due to the fact 
that the formula of Clarke and Barron holds only for interior points of the parameter 
space, we cannot expect that, in general, our formula will resemble that of Clarke and 
Barron. However, if the probability density, Qu, is concentrated on the boundary of the 
sphere, then we may disregard the interior of the sphere, and consider the boundary 
of the sphere as the true parameter space. This parameter space is two-dimensional 
and consists of interior points throughout. Indeed, the probability density g„ is concen- 
trated on the boundary of the sphere if we choose u = 1 since, as we remarked in the 
Introduction, in the limit u ^ 1, the distribution determined by qu tends to the uniform 
distribution over the boundary of the sphere. Let us, again, (naively) attempt to apply 
Clarke and Barron's formula (|1 . 1| ) to that case. We parameterize the boundary of the 
sphere by polar coordinates {i^,4>), 

X = sini!} cos ip 

y = sin ■}) sin if 

z = cos {}, 

< < 27r, < t9 < TT. 

The probability density induced by qu in the limit u 1 then is sin -t}/ An, the density of 
the uniform distribution. Using |2^, eq. (8)] (see footnote 2), the quantum (symmetric 
logarithmic derivative) Fisher information matrix turns out to be 

(o sin%) ' (^-^^ 

its determinant equalling, therefore, sin^'d. So, setting d = 2 and substituting sin^9/47r 
for w{9) and sin^?? for I{9) in ( |1 . 1|) gives logn + log 2 — 1. On the other hand, our 
formula (|2.39|) , for u = 1, gives logn. So, again, the terms differ only by a constant. In 
particular, the order of magnitude is again the same. 

Let us now focus our attention on the asymptotic minimax redundancy (|1.2| ) of 
Clarke and Barron. If in (|1.2| ) we again set d to 3, we obtain ( |L9|) , which, numerically, 
is I logn — 1.96736 + o(l). Clarke and Barron prove that this minimax expression is only 
attained by the (classical) Jeffreys' prior. In order to derive its quantum counterpart 
— at least, a restricted (to the family g^) version — we have to determine the behavior 
of 

n 

min max S(i^n, Cniu)) (3.2) 

-oo<u<10<r<l 

n 

for n — s> oo. By Theorem ^ we know that for large n the relative entropy S{0p, Cn{u)) 
equals 

3 13 1 

- log n - - - - log 2 - (1 - m) log(l - + ^ log 

+ logr(l-M) -logr(5/2-M), (3.3) 



(l + r) 
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up to an error of the order 0(l/n), which is uniform in u and r as long asO<r<l — e 
for any fixed e > 0. Let us for the moment ignore the error term. Then what we have 
to do is to determine the minimax of the expression (|3.3|), that is 



3 13 

- log - - - - log 2 + min max /(r, u), (3.4) 

Z Z Z — oo<u<l 0<r<i 

where 

/(r, u) = -(1 - u) log(l -r') + ^ log (j^) + 108^(1 - u) - logr(5/2 - u). 

(3.5) 

This is an easy task. First of all, if m < .5 then the function f{r,u) is unbounded at 
r = 1. Hence, to determine the minimax, we can ignore that range of u. If m = .5, then 
/(r, u) is maximal at r = 1, at which it attains the value — log 2 + ^ log tt ^ —0.120782. 
On the other hand, if m > .5 then f{r,u) attains a maximum in the interior of the 
interval < r < 1. To determine this maximum, we differentiate f{r,u) with respect 
to r, to obtain 

2r2 - 1 2ru 1 , 

log 



r (1 — r^) 1 — 2r^ \1 + r 
Equating this to gives 

1 _(i^,ogfi^y (3.6) 

Now we have to express r in terms of m, r = r{u) say, substitute in f{r,u), and de- 
termine min^oDKuKi f{f{u),u). However, equivalently, we can express u in terms of r, 
u = u{r) say (as was previously done in (p?6|)), substitute in f{r,u), and determine 
mmQ<r<i f{i^,u{r)). In order to do so, we differentiate f{r,u{r)) with respect to r, 
equate the result to 0, and solve for r. Numerically, the result is r .961574. Substi- 
tuting this back into ( |3.6| ), we obtain u ~ .542593. The value of f{r,u) at these values 
of r and u is —0.184320. This is smaller than that previously found (—0.120782) for 
u = .5, so that particular value of u is not of concern for the minimax, as well. 

In the beginning, we did ignore the error term. In fact, as is not very difficult to see, 
since the error term is uniform in u and r as long as < r < 1 —e for any fixed e > 0, it is 
legitimate to ignore the error term. To be precise, the asymptotic minimax is the result 
above, subject to an error of o(l), that is, the value of ( |3.3| ) for r ^ .961574 and u ~ 
.542593. This is | logn — 1.72404 + o(l). For u = .5, on the other hand, asymptotically, 
the maximum of the redundancy ( |2.31| ) (which, by the considerations above, is ( p. 3D 
for r = 1) equals |logn — | — |log2 + |log7r + o(l) ^ |logn — 1.66050 + o(l). We 
must, therefore, conclude that — in contrast to the classical case Jl^ |T8[ — our trial 
candidate (go.s) for the quantum counterpart of Jeffreys' prior does not exactly achieve 
the minimax redundancy, although the prior go.542593 is remarkably close to go.5) the 
hypothesized "quantum Jeffreys' prior" from ||5^, |53[| . 



We now concern ourselves with the asymptotic maximin redundancy. Clarke and 
Barron |]l^ |18[ prove that the maximin redundancy is attained asymptotically, again, 
by the Jeffreys' prior. To derive the quantum counterpart of the maximin redundancy 
within our analytical framework, we would have to calculate 

max min / S{^p,Qn) w{x,y, z) dx dydz, (3.7) 
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where Qn varies over the (2^" — l)-dimensional convex set of 2" x 2" density matrices 
and w varies over all probability densities over the unit ball. As we already mentioned 
in the Introduction, in the classical case, due to a result of Aitchison |0, pp. 549/550], 
the minimum is achieved by setting Qn to be the Bayes estimator, i.e., the average of 
all possible probability densities in the family that is considered with respect to the 
given probability distribution. In the quantum domain the same assertion is true. For 
the sake of completeness, we include the proof in the Appendix. We can, thus, take the 
quantum analog of the Bayes estimator to be the Bayesian density matrix C,n{u). That 
is, we set Q„ = Cn{u) in (|3.7|) . Let us, for the moment, restrict the possible w's over 
which the maximum is to be taken to the family —oo < u <1. Thus, we consider 



max 

u 



It 

S (®p, Cniu)) qu{x, y, z) dx dy dz. 



(3.8) 



By the definition (|1.3|) of relative entropy, we have 



5(®P,Cn('«)) = Tr ( ®p\og®p 

:i -r 



Tr 



n- 



-log 



(1 



?)plogC„(M) 

(i + r; 

+ n- 



log 



(1 + r) 



2^2 2 
the second line being due to ( p.29| ). Therefore, we get 



Tr (g)plogCr, 



z2<l 



S (®P, Cn{u)) Quix, y, z) dx dy dz 



n 



^0 



2tt 



-log 



'1 



+ — - — log ■ 



1 + r) 



—n 



2 2 

- Tr (C„(u) \og Cniu)) 
-7 + 5u 



r'^qu dip d-d dr 



2(2-m)(1 



u 



+ ^(5-2^i)-^(l-n) +5(Cn(^i)). (3.9) 



From Theorem |T0|, we know the asymptotics of the von Neumann entropy S{(n{u)). 
Hence, we find that the expression (|3.9|) is asymptotically equal to 



log n 



7 
2 



2u 
14 



log 2 

20m + 7u^ 



2{2-u){l-u) 



+ log (r(l-M)) - log (r(5/2-M)) 



+ (2 - 2m)(^(5 - 2m) - ^(1 -u)) + 



n 



(3.10) 



We have to, first, perform the maximization required in ( p.8|) , and then determine the 
asymptotics of the result. Due to the form of the asymptotics in (|3.10|) , we can, in 
fact, derive the proper result by proceeding in the reverse order. That is, we first 

n 

determine the asymptotics of J S{^p, (n{u)) Qudx dy dz, which we did in (|3.10| ), and 
then we maximize the w-dependent part in ( p.lO|) with respect to u (ignoring the error 
term). (In Figure 3 we display this u-dependent part over the range [—0.2,1].) Of 
course, we do the latter step by equating the first derivative of the w-dependent part in 



ASYMPTOTIC REDUNDANCIES 



29 



( p.lOp with respect to u to zero and solving for u. It turns out that this equation takes 
the appeahngly simple form 

2(1 - uf{ilj'{l -u)- ^'(5/2 - u)) = 1. (3.11) 

Numerically, we find this equation to have the solution u ~ .531267, at which the 
asymptotic maximin redundancy assumes the value |log?7, — 1.77185 + 0(l/n'^^^^^^). 
For u = .5, on the other hand, we have for the asymptotic redundancy ( p.lO| ), | log n — 
2-i log2 + i log7r+0(l/^) ^ I logn-1.77421+0(l/v^). Again, we must, therefore, 
conclude that — in contrast to the classical case fl^, — our trial candidate (go.5) for 
the quantum counterpart of Jeffreys' prior can not serve as a "reference prior," in the 
sense introduced by Bernardo [^, |10]. Moreover, — again in contrast to the classical 
situation — we find that the minimax and the maximin are not identical (although 
remarkably close). The two distinct priors yielding these values (go.542593, respectively 
^0.531267) are themselves remarkably close, as well. 

nats 

-Ir 



-1.5- 



max at u= .531267 




-0.2 0.2 0.4 0.6 0.8 1 
u-dependent part of the asymptotic Bayes redundancy ( 3.1C| ) 

Figure 3 

Since they are mixtures of product states, the matrices Cn('w) are classically — as 
opposed to EPR (Einstein-Podolsky-Rosen) — correlated |]61|. Therefore, S{Cn{u)) 
must not be less than the sum of the von Neumann entropies of any set of reduced 
density matrices obtained from it, through computation of partial traces. For positive 
integers, ni + ^2 + ■ ■ ■ = n, the corresponding reduced density matrices are simply 
Cniiu),Cn2iu), ■ ■ ■ , due to the mixing Exercise 7.10]. Using these reduced density 
matrices, one can compute conditional density matrices and quantum entropies ||13|| . 
Clarke and Barron [|r^, p. 40] have an alternative expression for the redundancy in terms 
of conditional entropies, and it would be of interest to ascertain whether a quantum 
analogue of this expression exists. 

Let us note that the theorem of Clarke and Barron utilized the uniform convergence 
property of the asymptotic expansion of the relative entropy (KuUback-Leibler diver- 
gence). Condition 2 in their paper Jl^ is, therefore, crucial. It assumes — as is typically 
the case classically 



that the matrix of second derivatives, J{0), of the relative entropy 
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is identical to the Fisher information matrix I {6). In the quantum domain, however, in 
general, J (9) > I (9), where J (9) is the matrix of second derivatives of the quantum rel- 
ative entropy ( |1.3| ) and I (9) is the symmetric logarithmic derivative Fisher information 
matrix The equality holds only for special cases. For instance, J{9) > I (9) 



does hold if r 7^ for the situation considered in this paper. The volume element of 
the Kubo-Mori/Bogoliubov (monotone) metric [^, ^ is given by ^detJ{9). This 
can be normalized for the two- level quantum systems to be a member [u = 1/2) of a 
one-parameter family of probability densities 

(l-M)r(5/2-M)r log((l + r)/(l-r)) sin'd 

vr3/M3-2.)r(l-.)(l-rT ' "^^ < " < ^' ^^.12) 

and similarly studied, it is presumed, in the manner of the family g„ (cf. ( |1.7| ) and ( |2.5| )) 
analyzed here. These two families can be seen to differ — up to the normalization factor 
— by the replacement of log ((1 + r)/(l — r)) in ( |3.12| ) by, simply, r. (These two last 
expressions are, of course, equal for r = 0.) In general, the volume element of a 
monotone metric over the two-level quantum systems is of the form P^ , eq. 3.17] 

sin ■}} 

/((I -r)/(l + r))(l -r2)i/2(i +^)' ^^"^^^ 

where / : —>■ is an operator monotone function such that /(I) = 1 and f{t) = 
tf{l/t). For f{t) = (l+t)/2, one recovers the volume element (-^ det I{9)) of the metric 
of the symmetric logarithmic derivative, and for f{t) = {t — l)/logt, that (^/det J (9)) 
of the Kubo-Mori/Bogoliubov metric [^, (It would appear, then, that the only 



member of the family proportional to a monotone metric is go. 5 5 that is ( [1.61 ). The 
maximin result we have obtained above corresponding to m ~ .531267 — the solution 
of (|3.11| ) — would appear unlikely, then, to extend globally beyond the family. Of 
course, a similar remark could be made in regard to to the minimax, corresponding to 
u ^ .542593, as shown above.) While J{9) can be generated from the relative entropy 
( |1.3|) (which is a limiting case of the a-entropies |^l|), I {9) is similarly obtained from 
H, eq. 3.16] 

Trpi(logpi -logpa)'. (3.14) 

It might prove of interest to repeat the general line of analysis carried out in this 
paper, but with the use of (3.14) rather than ( |1.3| ). Also of importance might be an 
analysis in which the relative entropy (|1.3| ) is retained, but the family (p.l2|) based 
on the Kubo-Mori/Bogoliubov metric is used instead of q^- Let us also indicate that 
if one equates the asymptotic redundancy formula of Clarke and Barron ( |1.1D (using 
w{9) = qu{x,y,z)) to that derived here (|2.37| ), neglecting the residual terms, solves for 
det (7(6')), and takes the square root of the result, one obtains a prior of the form ( |3.13|) 
based on the monotone function f(t) = t*/^*~^)/e. (Let us note that the reciprocal 
of the related "Morozova-Chentsov" function c{x,y) = l/yf{x/y), in this case. 



is the exponential mean ||4^ of x and y, while for the minimal monotone metric, the 



reciprocal of the Morozova-Chentsov function is the arithmetic mean. It is, therefore, 
quite interesting from an information-theoretic point of view that these are, in fact, the 
only two means which furnish additive quasiarithmetic average codeword lengths [0, p. 
157]. Also, it appears to be a quite important, challenging question — bearing upon 
the relationship between classical and quantum probability — to determine whether or 
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not a family of probability distributions over the Bloch sphere exists, which yields as 
its volume element for the corresponding Fisher information matrix, a prior of the form 
( pTT^ ) with the noted f{t) = t*/(*-i)/e.) 



As we said in the Introduction, ideally we would like to start with a (suitably well- 
behaved) arbitrary probability density on the unit ball, determine the relative entropy 

n n 

of ®p with respect to the average of ®p over the probability density, then find its 
asymptotics, and finally, among all such probability densities, find the one(s) for which 
the minimax and maximin are attained. In this regard, we wish to mention that a 
suitable combination of results and computations from Sec. with basic facts from 



representation theory of SU{2) (cf. |5g, [TT| for more information on that topic) yields 
the following result. 

Theorem 11. Let w be a spherically symmetric probability density on the unit ball, 
I.e., w = w{x,y,z) depends only on r = ^^/x^~+~tf~+^ . Furthermore, let Cn{w) be the 

average J^2_,_y2^22<i ( ® P) wdxdydz. Then the eigenvalues of C,n{w) are 

.r 

2"-ifra-2/i + r 



= , / r(l + r)"-^+i(l - r)^wi\r\) dr, h = 0,l, 



(3.15) 



with respective multiplicities 

(n-2h + 1)2 fn + 1 



[n + 1) \ h ' ^ ^ 

and corresponding eigenspaces {vh,s{P) : h < s < n — h, Pa ballot path from (0, 0) to 
{n,n — 2h)}, which were described in Sec. 

n ^ 

The relative entropy of^p with respect to Cn{w) is given by ( p.31|) , with Xh as given 
m ( pJ[5|) . 

We hope that this Theorem enables us to determine the asymptotics of the relative 
entropy and, eventually, to find, at least within the family of spherically symmetric 
(that is, unitarily-invariant) probability densities on the unit ball, the corresponding 
minimax and maximin redundancies. Doing so, would resolve the outstanding question 
of whether these two redundancies, in fact, coincide, as classical results would suggest 
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4. Summary 

Clarke and Barron [|r^, |TB[ (cf. [^) have derived several forms of asymptotic redun- 
dancy for arbitrarily parameterized families of probability distributions. We have been 
motivated to undertake this study by the possibility that their results may generalize, 
in some yet not fully understood fashion, to the quantum domain of noncommutative 
probability. (Thus, rather than probability densities, we have been concerned here with 
density matrices.) We have only, so far, been able to examine this possibility in a some- 
what restricted manner. By this, we mean that we have limited our consideration to 
two-level quantum systems (rather than ?7,-level ones, n > 2), and for the case n = 2, we 
have studied (what has proven to be) an analytically tractable one-parameter family 
of possible prior probability densities, g„, — oo < u < 1 (rather than the totality of 
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arbitrary probability densities). Consequently, our results can not be as definitive in 
nature as those of Clarke and Barron. Nevertheless, the analyses presented here reveal 
that our trial candidate (go. 5, that is (|1.6|)) for the quantum counterpart of the Jeffreys' 
prior closely approximates those probability distributions which we have, in fact, found 
to yield the minimax (^0.542593) and maximin (go.531267) for our one-parameter family 

{in)- 

Future research might be devoted to expanding the family of probability distributions 
used to generate the Bayesian density matrices for n = 2, as well as similarly studying 
the n-level quantum systems [n > 2). (In this regard, we have examined the situation 
in which n = 2™, and the only n x n density matrices considered are simply the 
tensor products of m identical 2x2 density matrices. Surprisingly, for m = 2, 3, the 
associated trivariate candidate quantum Jeffreys' prior, taken, as throughout this study, 
to be proportional to the volume elements of the metrics of the symmetric logarithmic 
derivative (cf. [Q), have been found to be improper (nonnormalizable) over the Bloch 
sphere. The minimality of such metrics is guaranteed, however, only if "the whole 
state space of a spin is parameterized" [0.) In all such cases, it will be of interest 
to evaluate the characteristics of the relevant candidate quantum Jeffreys' prior vis- 
d-vis all other members of the family of probability distributions employed over the 
(n^ — l)-dimensional convex set of n x n density matrices. 

We have also conducted analyses parallel to those reported above, but having, ab 
initio, set either x or ?/ to zero in the 2x2 density matrices ( |1.4| ). This, then, places 
us in the realm of real — as opposed to complex (standard or conventional) quantum 
mechanics. (Of course, setting both x and y to zero would return us to a strictly classical 
situation, in which the results of Clarke and Barron 113, as applied to binomial 



distributions, would be directly applicable.) Though we have — on the basis of detailed 
computations — developed strong conjectures as to the nature of the associated results, 
we have not, at this stage of our investigation, yet succeeded in formally demonstrating 
their validity. 

In conclusion, again in analogy to classical results, we would like to raise the pos- 
sibility that the quantum asymptotic redundancies derived here might prove of value 
in deriving formulas for the stochastic complexity ^ (cf. |^) — the shortest de- 



scription length — of a string of n quantum bits. The competing possible models for 
the data string might be taken to be the 2x2 density matrices (p) corresponding to 
different values of r, or equivalently, different values of the von Neumann entropy, S{p). 



Appendix: The quantum Bayes estimator achieves the minimum average 

ENTROPY 

Let Pg, 9 E Q, he a family of density matrices, and let ^(6*), 6* G O, be a probability 
density on 0. 

Theorem 12. The minimum 



mm J w{9)S{Pe,Q)d9, 
taken over all density matrices Q, is achieved by M = J w{9)Pgd9. 



ASYMPTOTIC REDUNDANCIES 



33 



Proof. We look at the difference 

w{9)S{Pe,Q)d9- I w{9)S{Pe,M)de, 
and show that it is nonnegative. Indeed, 

w{e)S{Pe,Q)d9- [ w{9)S{Pg,M)de 



w{e) Tr(Pe log Pe - Pq log Q)de- / w{e) Tr(P, log Pe - Pe log M) dO 
w{e) Tr {Pe{\ogM- \ogQ)) dO 



Tr((y w{e)PedeY\ogM -\ogQ) 

Tr [M {log M -logQ)) 
^(M,g) >0, 



since relative entropies of density matrices are nonnegative |38, bottom of p. 17]. □ 
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